A comparison of estimated and MAP-predicted formants and fundamental frequencies with a speech reconstruction application

نویسندگان

Jonathan Darch

Ben P. Milner

چکیده

This work compares the accuracy of fundamental frequency and formant frequency estimation methods and maximum a posteriori (MAP) prediction from MFCC vectors with hand-corrected references. Five fundamental frequency estimation methods are compared to fundamental frequency prediction from MFCC vectors in both clean and noisy speech. Similarly, three formant frequency estimation and prediction methods are compared. An analysis of estimation and prediction accuracy shows that prediction from MFCCs provides the most accurate voicing classification across clean and noisy speech. On clean speech, fundamental frequency estimation outperforms prediction from MFCCs, but as noise increases the performance of prediction is significantly more robust than estimation. Formant frequency prediction is found to be more accurate than estimation in both clean and noisy speech. A subjective analysis of the estimation and prediction methods is also made by reconstructing speech from the acoustic features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Formants Estimation Techniques for Speech Analysis

Measuring formant frequencies in speech signals is indispensable for the search and technically problematic. Accurate measurement of formant frequencies is important in many studies of speech perception and production. Unfortunately, there is no totally effective method to allow good valuations of these frequencies. This paper presents a comparative study of two techniques of speech parameteriz...

متن کامل

Impact of Novel Incorporation of CT-based Segment Mapping into a Conjugated Gradient Algorithm on Bone SPECT Imaging: Fundamental Characteristics of a Context-specific Reconstruction Method

Objective(s): The latest single-photon emission computed tomography (SPECT)/computed tomography (CT) reconstruction system, referred to as xSPECT Bone™, is a context-specific reconstruction system utilizing tissue segmentation information from CT data, which is called a zone map. The aim of this study was to evaluate theeffects of zone-map enhancement incorporated into the ordered-subset conjug...

متن کامل

Speech Enhancement Using Gaussian Mixture Models, Explicit Bayesian Estimation and Wiener Filtering

Gaussian Mixture Models (GMMs) of power spectral densities of speech and noise are used with explicit Bayesian estimations in Wiener filtering of noisy speech. No assumption is made on the nature or stationarity of the noise. No voice activity detection (VAD) or any other means is employed to estimate the input SNR. The GMM mean vectors are used to form sets of over-determined system of equatio...

متن کامل

بررسی ساختار سازه‌ای واکه‌های زبان فارسی در بزرگ‌سالان دوزبانه آذری فارسی

Objective: Vowels are the center of syllables while formant structures are one of the most important acoustic characteristics of speech sounds that help in their articulatory and perceptual aspects. Formants represent the shape and size of the vocal tract. There exist trivial differences between the vocal tracts of different people due to which the formant structures of a vowel in one person ar...

متن کامل

Predicting Formant Frequencies from MFCC Vectors

This work proposes a novel method of predicting formant frequencies from a stream of mel-frequency cepstral coefficients (MFCC) feature vectors. Prediction is based on modelling the joint density of MFCCs and formant frequencies using a Gaussian mixture model (GMM). Using this GMM and an input MFCC vector, two maximum a posteriori (MAP) prediction methods are developed. The first method predict...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

A comparison of estimated and MAP-predicted formants and fundamental frequencies with a speech reconstruction application

نویسندگان

چکیده

منابع مشابه

Formants Estimation Techniques for Speech Analysis

Impact of Novel Incorporation of CT-based Segment Mapping into a Conjugated Gradient Algorithm on Bone SPECT Imaging: Fundamental Characteristics of a Context-specific Reconstruction Method

Speech Enhancement Using Gaussian Mixture Models, Explicit Bayesian Estimation and Wiener Filtering

بررسی ساختار سازه‌ای واکه‌های زبان فارسی در بزرگ‌سالان دوزبانه آذری فارسی

Predicting Formant Frequencies from MFCC Vectors

عنوان ژورنال:

اشتراک گذاری